In this part of my project I will refine my research questions. I will further examine the effects of the pandemic on recent MCPS highschool graduates enrolled at Montgomery College. For the purposes of this study I will limit my dataset to MCPS students under the age of 20. These MCPS students will be divided further into subgroups based on Gender and Race. The datasets used in this part of my project have already been cleaned in my initial data analysis. Outliers have not been removed. I will conduct my statistical analysis with and without the outliers.
For the purposes of this Project the following variables and definitions are important.
The population in this dataset is the incoming cohort of students in Fall of 2019 and 2020. These students are first time degree or certificate seekers and have no prior tertiary education. They may have earned AP credits in highschool.
Fall2019 refers to the incoming freshman cohort in Fall2019. This is term year 2020.
Fall2020 refers to the incoming freshman cohort in Fall2020. This is term year 2021.
Variables of Interest: term year Incoming students in Fall2019 are assigned to term year 2020. Incoming students in Fall 2020 are assigned to term year 2021.
hours_earned: refers to credit hours the student has earned in their first Fall semester ( this can include credits earned in Summer school second session- Summer 1 and AP credits earned in high school).
hours_attempted: refers to credit and non credit hours the student has attempted in their first Fall semester ( this may include credits attempted in Summerschool second session - Summer 1).
full_part: is the student full-time (FT) or part-time (PT). Part time students are registered in less than 12 credit hours. Full-time students take at least 12 credits. major: degree programme student is registered for or certificate&LR ( letter of recommendation.) All certificates and letters of recommendations have been grouped together.
hours_earned_rate: Ratio of hours_earned/hours_attempted age: Age of student at start of program.
race: Racial classification of student. sex: Gender classification of student. high_school: Name of highschool student graduted from. Public High schools in Montgomery county are classified as MCPS. pell: Whether the student receives a pell grant or not.
Summary of Data and Types
skim(df_Degrees)
| Name | df_Degrees |
| Number of rows | 7123 |
| Number of columns | 24 |
| _______________________ | |
| Column type frequency: | |
| character | 15 |
| logical | 1 |
| numeric | 8 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| sex | 0 | 1.00 | 1 | 1 | 0 | 4 | 0 |
| race | 0 | 1.00 | 5 | 22 | 0 | 9 | 0 |
| age | 0 | 1.00 | 4 | 7 | 0 | 5 | 0 |
| high_school | 0 | 1.00 | 7 | 30 | 0 | 163 | 0 |
| full_part | 0 | 1.00 | 2 | 2 | 0 | 2 | 0 |
| city | 19 | 1.00 | 5 | 19 | 0 | 127 | 0 |
| stat_code | 19 | 1.00 | 2 | 2 | 0 | 16 | 0 |
| pell_grant | 0 | 1.00 | 1 | 1 | 0 | 2 | 0 |
| camp_code | 140 | 0.98 | 1 | 1 | 0 | 6 | 0 |
| major | 0 | 1.00 | 1 | 61 | 0 | 34 | 0 |
| pass_engl | 0 | 1.00 | 1 | 1 | 0 | 2 | 0 |
| pass_math | 0 | 1.00 | 1 | 1 | 0 | 2 | 0 |
| summer1 | 0 | 1.00 | 1 | 1 | 0 | 1 | 0 |
| fall | 0 | 1.00 | 1 | 1 | 0 | 1 | 0 |
| HS_classify | 0 | 1.00 | 2 | 14 | 0 | 7 | 0 |
Variable type: logical
| skim_variable | n_missing | complete_rate | mean | count |
|---|---|---|---|---|
| MCPS | 0 | 1 | 0.7 | TRU: 4963, FAL: 2160 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| u_number | 0 | 1 | 20196625.60 | 5027.06 | 20190001 | 20191872.50 | 20193733.00 | 20201703.5 | 20203588.0 | ▇▃▁▂▇ |
| zip | 19 | 1 | 20886.64 | 1559.40 | 1460 | 20853.00 | 20877.00 | 20903.0 | 94025.0 | ▁▇▁▁▁ |
| hours_attempted | 0 | 1 | 12.46 | 6.23 | 1 | 9.00 | 12.00 | 15.0 | 54.0 | ▆▇▁▁▁ |
| hours_earned | 0 | 1 | 7.85 | 7.43 | 0 | 3.00 | 6.00 | 12.0 | 54.0 | ▇▃▁▁▁ |
| mc_gpa | 0 | 1 | 2.19 | 1.47 | 0 | 0.67 | 2.50 | 3.5 | 4.0 | ▆▂▃▅▇ |
| term_year | 0 | 1 | 2020.47 | 0.50 | 2020 | 2020.00 | 2020.00 | 2021.0 | 2021.0 | ▇▁▁▁▇ |
| hours_earned_rate | 0 | 1 | 0.57 | 0.38 | 0 | 0.23 | 0.64 | 1.0 | 3.2 | ▇▇▁▁▁ |
| unearned_hours | 0 | 1 | 4.61 | 4.24 | -22 | 0.00 | 4.00 | 7.0 | 25.0 | ▁▁▇▂▁ |
Change Datatypes
df_Degrees$u_number<- as.character(df_Degrees$u_number)
df_Degrees$term_year<- as.character(df_Degrees$term_year)
Use the dataframe df_Degrees which has been cleaned in the initial data analysis. Filter all MCPS students who are 20yrs and younger in age.
df_MCPS20D<-df_Degrees %>%
filter(HS_classify=="MCPS")%>% # filter degrees dataset to obtain students who graduated MCPS highschools
filter(age=='18 - 20' | age =="< 18") # filter students who are 20yrs old and younger.
Frequency of Students Part time versus Full tim: 2020 vs 2021
# Number of students part time abnd full time 2020 vs 2021
ggplot(data=df_MCPS20D, aes(x=full_part, fill=full_part)) +
geom_bar() +
geom_text(stat='count', aes(label=..count..), vjust=2,size=3)+
facet_wrap(~term_year)+
ggtitle("Number of Students Full time versus Part time")+
ylab('Frequency')+
xlab("")+
theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())
# change in overall MCPS student population from 2020 to 2021
df_MCPS20D%>%
group_by(term_year,full_part)%>%
count(full_part)%>%
group_by(term_year)%>%
mutate(total_pop =sum(n))%>%
group_by(full_part)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 4 x 5
## # Groups: full_part [2]
## term_year full_part n total_pop pct_change
## <chr> <chr> <int> <int> <dbl>
## 1 2020 FT 1655 2456 NA
## 2 2021 FT 1556 2303 -5.98
## 3 2020 PT 801 2456 NA
## 4 2021 PT 747 2303 -6.74
There was a 5.98% decrease in full time students who graduated from MCPS highschools in term year 2021. There was a -6.74% decrease in part time students who graduated from MCPS.
Count of Race Groups
ggplot(data=df_MCPS20D, aes(x=race, fill=race)) +
geom_bar() +
geom_text(stat='count', aes(label=..count..), vjust=0,size=3)+
facet_wrap(~term_year + full_part)+
theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())+
ggtitle("Number of Students per a Race Group")+
xlab("Race")+
ylab("Frequency")
Full time student: Change in enrollment from 2020 to 2021 based on Race
# calculate percentage change in full time student enrollment from 2020 to 2021 by race
df_MCPS20D%>%
filter(full_part=="FT")%>%
group_by(term_year,race)%>%
count(race)%>%
group_by(race)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 18 x 4
## # Groups: race [9]
## term_year race n pct_change
## <chr> <chr> <int> <dbl>
## 1 2020 Am. Indian / AK Native 5 NA
## 2 2021 Am. Indian / AK Native 1 -80
## 3 2020 Asian 272 NA
## 4 2021 Asian 227 -16.5
## 5 2020 Black / African Am. 389 NA
## 6 2021 Black / African Am. 326 -16.2
## 7 2020 Foreign 103 NA
## 8 2021 Foreign 96 -6.80
## 9 2020 Hawaiian / Pac. Isl. 5 NA
## 10 2021 Hawaiian / Pac. Isl. 3 -40
## 11 2020 Hispanic 534 NA
## 12 2021 Hispanic 596 11.6
## 13 2020 Multi-Race 71 NA
## 14 2021 Multi-Race 63 -11.3
## 15 2020 Unknown 11 NA
## 16 2021 Unknown 3 -72.7
## 17 2020 White 265 NA
## 18 2021 White 241 -9.06
Full time students: There was a 16.5% decline in asian students, 16.1% decline in African American students, a 9.1% decline in white students and 6.8% decline in foreign students. Hispanic students increased by 11.6%.
Part time student: Change in enrollment from 2020 to 2021 based on Race
# calculate percentage change in full time student enrollment from 2020 to 2021 by race
df_MCPS20D%>%
filter(full_part=="PT")%>%
group_by(term_year,race)%>%
count(race)%>%
group_by(race)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 18 x 4
## # Groups: race [9]
## term_year race n pct_change
## <chr> <chr> <int> <dbl>
## 1 2020 Am. Indian / AK Native 4 NA
## 2 2021 Am. Indian / AK Native 1 -75
## 3 2020 Asian 69 NA
## 4 2021 Asian 63 -8.70
## 5 2020 Black / African Am. 177 NA
## 6 2021 Black / African Am. 181 2.26
## 7 2020 Foreign 73 NA
## 8 2021 Foreign 54 -26.0
## 9 2020 Hawaiian / Pac. Isl. 1 NA
## 10 2021 Hawaiian / Pac. Isl. 1 0
## 11 2020 Hispanic 327 NA
## 12 2021 Hispanic 263 -19.6
## 13 2020 Multi-Race 33 NA
## 14 2021 Multi-Race 35 6.06
## 15 2020 Unknown 5 NA
## 16 2021 Unknown 2 -60
## 17 2020 White 112 NA
## 18 2021 White 147 31.2
Part time students: There was an 8.7% decrease in Asian students, a 26% decrease in foreign students, 2.3% increase in african american students and a 19.6% decrease in hispanic students. There was a 31.25% increase in white students.
Gender of Students
# Gender of students part time and full time 2020 vs 2021
ggplot(data=df_MCPS20D, aes(x=sex, fill=sex)) +
geom_bar() +
geom_text(stat='count', aes(label=..count..), vjust=1,size=3)+
facet_wrap(~term_year+full_part)+
ggtitle("Gender of Students: Full time versus Part time")+
ylab('Frequency')+
xlab("")+
theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())
Calculate percentage change in full time student enrollment from 2020 to 2021 by gender
# calculate percentage change in full time student enrollment from 2020 to 2021 by gender
df_MCPS20D%>%
filter(full_part=="FT")%>%
filter(sex=="F"|sex =="M")%>%
group_by(term_year,sex)%>%
count(sex)%>%
group_by(sex)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 4 x 4
## # Groups: sex [2]
## term_year sex n pct_change
## <chr> <chr> <int> <dbl>
## 1 2020 F 793 NA
## 2 2021 F 819 3.28
## 3 2020 M 842 NA
## 4 2021 M 719 -14.6
Full time students: 14% decrease in attendance by male students. A 3.27% decrease in female students.
Calculate percentage change in part time student enrollment from 2020 to 2021 by gender
# calculate percentage change in part time student enrollment from 2020 to 2021 by gender
df_MCPS20D%>%
filter(full_part=="PT")%>%
filter(sex=="F"|sex =="M")%>%
group_by(term_year,sex)%>%
count(sex)%>%
group_by(sex)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 4 x 4
## # Groups: sex [2]
## term_year sex n pct_change
## <chr> <chr> <int> <dbl>
## 1 2020 F 381 NA
## 2 2021 F 345 -9.45
## 3 2020 M 401 NA
## 4 2021 M 395 -1.50
Part time: 9.5% decrease in female students. 1.5% decrease in male students.
Gender and Race breakdown of full time students
# Gender and Race of full time students 2020 vs 2021
df_MCPS20D%>%
filter(sex %in% c("F","M"))%>%
filter(full_part=="FT")%>%
ggplot(., aes(x=race, fill=race)) +
geom_bar() +
geom_text(stat='count', aes(label=..count..), vjust=0, size=3)+
facet_wrap(~term_year+sex)+
ggtitle("Gender and Race of Full time Students")+
ylab('Frequency')+
xlab("")+
theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())
# theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())
Full time Student Enrollment Percentages trend by Gender and race
# calculate percentage change in student enrollment from 2020 to 2021 by race and gender
# create data frames with counts of full time students by race and gender
df_MCPS20D%>%
filter(full_part=="FT")%>%
filter(sex=="F"|sex =="M")%>%
group_by(term_year,race,sex)%>%
count(sex)%>%
group_by(race,sex)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 35 x 5
## # Groups: race, sex [18]
## term_year race sex n pct_change
## <chr> <chr> <chr> <int> <dbl>
## 1 2020 Am. Indian / AK Native F 4 NA
## 2 2020 Am. Indian / AK Native M 1 NA
## 3 2021 Am. Indian / AK Native M 1 0
## 4 2020 Asian F 111 NA
## 5 2021 Asian F 115 3.60
## 6 2020 Asian M 159 NA
## 7 2021 Asian M 110 -30.8
## 8 2020 Black / African Am. F 178 NA
## 9 2021 Black / African Am. F 169 -5.06
## 10 2020 Black / African Am. M 202 NA
## # … with 25 more rows
Part time Student Enrollment Percentages trend by Gender and race
# calculate percentage change in student enrollment from 2020 to 2021 by race and gender
# create data frames with counts of full time students by race and gender
df_MCPS20D%>%
filter(full_part=="PT")%>%
filter(sex=="F"|sex =="M")%>%
group_by(term_year,race,sex)%>%
count(sex)%>%
group_by(race,sex)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 31 x 5
## # Groups: race, sex [17]
## term_year race sex n pct_change
## <chr> <chr> <chr> <int> <dbl>
## 1 2020 Am. Indian / AK Native M 4 NA
## 2 2021 Am. Indian / AK Native M 1 -75
## 3 2020 Asian F 30 NA
## 4 2021 Asian F 19 -36.7
## 5 2020 Asian M 37 NA
## 6 2021 Asian M 44 18.9
## 7 2020 Black / African Am. F 79 NA
## 8 2021 Black / African Am. F 84 6.33
## 9 2020 Black / African Am. M 96 NA
## 10 2021 Black / African Am. M 94 -2.08
## # … with 21 more rows
Overall Majors trend
Count of Majors in Full time students in 2020
z1<- df_MCPS20D%>%
filter(full_part=="FT" &term_year =="2020")%>%
ggplot(., aes(x=major, fill=major)) +
geom_bar() +
geom_text(stat='count', aes(label=..count..), vjust=0, hjust=0, size =3)+
ggtitle("Majors of Full-time Students in 2020 ")+
xlab("Major")+
ylab("Frequency")+
theme(legend.position = "none")
z1 + coord_flip()
Count of Majors in Full time students in 2021
z13<- df_MCPS20D%>%
filter(full_part=="FT" &term_year =="2021")%>%
ggplot(., aes(x=major, fill=major)) +
geom_bar() +
geom_text(stat='count', aes(label=..count..), vjust=0, hjust=0, size =3)+
ggtitle("Majors of Full-time Students in 2021 ")+
xlab("Major")+
ylab("Frequency")+
theme(legend.position = "none")
z13 + coord_flip()
calculate percentage change in full time student majors from 2020 to 2021
df_MCPS20D%>%
filter(full_part=="FT")%>%
group_by(term_year,major)%>%
count(major)%>%
group_by(term_year)%>%
group_by(major)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 62 x 4
## # Groups: major [33]
## term_year major n pct_change
## <chr> <chr> <int> <dbl>
## 1 2020 0 3 NA
## 2 2021 0 2 -33.3
## 3 2020 American Sign Language 5 NA
## 4 2021 American Sign Language 1 -80
## 5 2020 Applied Geography 1 NA
## 6 2021 Applied Geography 2 100
## 7 2020 Architectural Technology 15 NA
## 8 2021 Architectural Technology 19 26.7
## 9 2020 Art 24 NA
## 10 2021 Art 22 -8.33
## # … with 52 more rows
Count of Majors in Part time students in 2020
z11<- df_MCPS20D%>%
filter(full_part=="PT" &term_year =="2020")%>%
ggplot(., aes(x=major, fill=major)) +
geom_bar() +
geom_text(stat='count', aes(label=..count..), vjust=0, hjust=0, size =3)+
ggtitle("Majors of Part-time Students in 2020 ")+
xlab("Major")+
ylab("Frequency")+
theme(legend.position = "none")
z11 + coord_flip()
Count of Majors in Part time students in 2021
z12<- df_MCPS20D%>%
filter(full_part=="PT" &term_year =="2021")%>%
ggplot(., aes(x=major, fill=major)) +
geom_bar() +
geom_text(stat='count', aes(label=..count..), vjust=0, hjust=0, size =3)+
ggtitle("Majors of Part-time Students in 2021 ")+
xlab("Major")+
ylab("Frequency")+
theme(legend.position = "none")
z12 + coord_flip()
calculate percentage change in part time student majors from 2020 to 2021
df_MCPS20D%>%
filter(full_part=="PT")%>%
group_by(term_year,major)%>%
count(major)%>%
group_by(term_year)%>%
group_by(major)%>%
arrange(term_year,.by_group=TRUE)%>%
mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 60 x 4
## # Groups: major [32]
## term_year major n pct_change
## <chr> <chr> <int> <dbl>
## 1 2020 0 5 NA
## 2 2020 American Sign Language 1 NA
## 3 2021 American Sign Language 2 100
## 4 2020 Applied Geography 2 NA
## 5 2020 Architectural Technology 13 NA
## 6 2021 Architectural Technology 4 -69.2
## 7 2020 Art 12 NA
## 8 2021 Art 14 16.7
## 9 2020 Broadcast Media 5 NA
## 10 2021 Broadcast Media 4 -20
## # … with 50 more rows
For the purposes of this analysis I will run the analysis first with outliers and then after removing outliers.
Boxplots of hours_attempted by year by MCPS students 20yrs and younger
p11 = ggplot(df_MCPS20D, aes(hours_attempted))
p11 + geom_boxplot(aes(colour = term_year)) +
facet_wrap(~full_part)
Students who register for more than 18 credits require special permission from the department. Further more a full time student is classified as someone who is enrolled in 12 or more credits. A part time student is classified as someone who is enrolled in less than 12 credits. However based on thge dataset, a number of full time students attempt less than 12 credits and large a number of part time students attempt more than 12 hours.
Boxplots of hours_attempted by year by Full time MCPS students 20yrs and younger
df_MCPS20D%>%filter(full_part=="FT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_attempted))+
geom_boxplot(aes(colour = term_year)) +
facet_wrap(~race)
Boxplots of hours_attempted by year by Part time MCPS students 20yrs and younger
df_MCPS20D%>%filter(full_part=="PT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_attempted))+
geom_boxplot(aes(colour = term_year)) +
facet_wrap(~race)
There are not many outliers in the part time student groups. Term year 2021 seems to have more outliers on the upper end.
Density plot of hours_attempted by year
ggplot(df_MCPS20D, aes(hours_attempted, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~full_part)+
xlab("Hours attempted") +
ylab( "Density")+
ggtitle(" Hours Attempted by Full-time Students vs Part-time Students")
Hours attempted by full time students
df_MCPS20D%>%filter(full_part=="FT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_attempted, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~race)+
xlab("Hours attempted") +
ylab( "Density") +
ggtitle(" Hours Attempted by Full-time Students")
Fivenum Summary of Full time students
df_MCPS20D%>% filter(full_part=="FT")%>%
group_by(race,term_year)%>%
summarise(n = n(),
min = fivenum(hours_attempted)[1],
Q1 = fivenum(hours_attempted)[2],
median = fivenum(hours_attempted)[3],
Q3 = fivenum(hours_attempted)[4],
max = fivenum(hours_attempted)[5],
mean= mean(hours_attempted),
sd = sd(hours_attempted))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups: race [9]
## race term_year n min Q1 median Q3 max mean sd
## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Am. Indian / AK N… 2020 5 6 13 17 19 36 18.2 11.1
## 2 Am. Indian / AK N… 2021 1 13 13 13 13 13 13 NA
## 3 Asian 2020 272 6 13 15 20 52 17.7 8.13
## 4 Asian 2021 227 7 13 15 17 46 16.8 7.00
## 5 Black / African A… 2020 389 5 12 13 14 42 13.7 3.53
## 6 Black / African A… 2021 326 4 12 14 16 38 14.9 4.31
## 7 Foreign 2020 103 7 12 14 17 31 14.8 4.26
## 8 Foreign 2021 96 7 12.5 15 16 37 15.7 5.29
## 9 Hawaiian / Pac. I… 2020 5 9 12 13 13 15 12.4 2.19
## 10 Hawaiian / Pac. I… 2021 3 12 15.5 19 24.5 30 20.3 9.07
## 11 Hispanic 2020 534 4 12 13 15 39 14.2 4.43
## 12 Hispanic 2021 596 3 12 14 16 43 15.0 4.33
## 13 Multi-Race 2020 71 6 12 13 17 44 16.7 8.04
## 14 Multi-Race 2021 63 6 12 14 16.5 43 15.7 6.22
## 15 Unknown 2020 11 9 12 14 15 31 15 5.78
## 16 Unknown 2021 3 12 12 12 13 14 12.7 1.15
## 17 White 2020 265 8 12 13 16 46 15.9 7.08
## 18 White 2021 241 7 13 14 17 54 16.5 6.37
Hours attempted by part time students
df_MCPS20D%>%filter(full_part=="PT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_attempted, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~race)+
xlab("Hours attempted") +
ylab( "Density")+
ggtitle(" Hours Attempted by Part-time Students")
Fivenum Summary of Part time students
df_MCPS20D%>% filter(full_part=="PT")%>%
group_by(race,term_year)%>%
summarise(n = n(),
min = fivenum(hours_attempted)[1],
Q1 = fivenum(hours_attempted)[2],
median = fivenum(hours_attempted)[3],
Q3 = fivenum(hours_attempted)[4],
max = fivenum(hours_attempted)[5],
mean= mean(hours_attempted),
sd = sd(hours_attempted))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups: race [9]
## race term_year n min Q1 median Q3 max mean sd
## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Am. Indian / AK N… 2020 4 3 3 5.5 8.5 9 5.75 3.20
## 2 Am. Indian / AK N… 2021 1 6 6 6 6 6 6 NA
## 3 Asian 2020 69 2 6 9 10 33 8.62 4.94
## 4 Asian 2021 63 3 7.5 9 11 21 8.90 3.64
## 5 Black / African A… 2020 177 1 6 7 9 15 7.28 2.62
## 6 Black / African A… 2021 181 1 6 8 10 25 7.80 3.37
## 7 Foreign 2020 73 3 6 8 10 23 8.18 3.89
## 8 Foreign 2021 54 3 5 9 10 29 8.61 4.38
## 9 Hawaiian / Pac. I… 2020 1 6 6 6 6 6 6 NA
## 10 Hawaiian / Pac. I… 2021 1 5 5 5 5 5 5 NA
## 11 Hispanic 2020 327 1 6 8 9 21 7.84 3.06
## 12 Hispanic 2021 263 1 6 8 11 42 8.73 4.41
## 13 Multi-Race 2020 33 1 4 8 9 12 7.03 2.98
## 14 Multi-Race 2021 35 3 6 9 10 26 8.34 3.90
## 15 Unknown 2020 5 7 9 10 10 10 9.2 1.30
## 16 Unknown 2021 2 4 4 6.5 9 9 6.5 3.54
## 17 White 2020 112 1 6 8 10 33 8.15 4.47
## 18 White 2021 147 3 5 8 10 39 8.43 5.02
Boxplots of Hours Earned by year by MCPS students 20yrs and younger
p11 = ggplot(df_MCPS20D, aes(hours_earned))
p11 + geom_boxplot(aes(colour = term_year)) +
facet_wrap(~full_part)
Boxplots of hours_earned by year by Full time MCPS students 20yrs and younger
df_MCPS20D%>%filter(full_part=="FT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_earned))+
geom_boxplot(aes(colour = term_year)) +
facet_wrap(~race)
Boxplots of hours_earned by year by Part time MCPS students 20yrs and younger
df_MCPS20D%>%filter(full_part=="PT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_earned))+
geom_boxplot(aes(colour = term_year)) +
facet_wrap(~race)
There are not many outliers in the part time student groups. Term year 2021 seems to have more outliers on the upper end.
Density plot of hours_earned by year
ggplot(df_MCPS20D, aes(hours_earned, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~full_part)+
xlab("Hours Earned") +
ylab( "Density")+
ggtitle(" Hours Earned by Full-time vs Part-time Students")
Hours_earned by full time students
df_MCPS20D%>%filter(full_part=="FT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_earned, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~race)+
xlab("Hours Earned") +
ylab( "Density")+
ggtitle(" Hours Earned by Full-time Students")
Fivenum Summary of Full time students
df_MCPS20D%>% filter(full_part=="FT")%>%
group_by(race,term_year)%>%
summarise(n = n(),
min = fivenum(hours_earned)[1],
Q1 = fivenum(hours_earned)[2],
median = fivenum(hours_earned)[3],
Q3 = fivenum(hours_earned)[4],
max = fivenum(hours_earned)[5],
mean= mean(hours_earned),
sd = sd(hours_earned))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups: race [9]
## race term_year n min Q1 median Q3 max mean sd
## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Am. Indian / AK N… 2020 5 0 10 14 19 36 15.8 13.3
## 2 Am. Indian / AK N… 2021 1 13 13 13 13 13 13 NA
## 3 Asian 2020 272 0 9 13 17 52 14.7 9.15
## 4 Asian 2021 227 0 9 12 16 46 13.4 8.53
## 5 Black / African A… 2020 389 0 6 9 12 42 8.85 5.58
## 6 Black / African A… 2021 326 0 6 9 13 37 9.55 6.53
## 7 Foreign 2020 103 0 6 9 13 31 10.4 6.44
## 8 Foreign 2021 96 0 6 10 13 37 10.6 7.56
## 9 Hawaiian / Pac. I… 2020 5 0 0 9 12 13 6.8 6.38
## 10 Hawaiian / Pac. I… 2021 3 9 12.5 16 23 30 18.3 10.7
## 11 Hispanic 2020 534 0 6 9 12 38 9.57 6.50
## 12 Hispanic 2021 596 0 6 10 13 33 9.73 6.36
## 13 Multi-Race 2020 71 0 7 12 15 44 13.4 9.76
## 14 Multi-Race 2021 63 0 6 10 13.5 43 10.8 8.69
## 15 Unknown 2020 11 3 5 9 13 31 10.5 7.90
## 16 Unknown 2021 3 3 5 7 9.5 12 7.33 4.51
## 17 White 2020 265 0 7 11 15 46 12.3 8.88
## 18 White 2021 241 0 7 12 15 54 12.5 8.22
hours_earned by part time students
df_MCPS20D%>%filter(full_part=="PT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_earned, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~race)+
xlab("Hours Earned") +
ylab( "Density")+
ggtitle(" Hours Earned by Part-time Students")
Fivenum Summary of Part time students
df_MCPS20D%>% filter(full_part=="PT")%>%
group_by(race,term_year)%>%
summarise(n = n(),
min = fivenum(hours_earned)[1],
Q1 = fivenum(hours_earned)[2],
median = fivenum(hours_earned)[3],
Q3 = fivenum(hours_earned)[4],
max = fivenum(hours_earned)[5],
mean= mean(hours_earned),
sd = sd(hours_earned))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups: race [9]
## race term_year n min Q1 median Q3 max mean sd
## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Am. Indian / AK … 2020 4 0 1.5 3 3 3 2.25 1.5
## 2 Am. Indian / AK … 2021 1 3 3 3 3 3 3 NA
## 3 Asian 2020 69 0 0 4 6 33 4.81 5.50
## 4 Asian 2021 63 0 3 3 6 21 4.73 4.48
## 5 Black / African … 2020 177 0 0 3 4 11 2.73 2.82
## 6 Black / African … 2021 181 0 0 1 6 22 2.81 3.72
## 7 Foreign 2020 73 0 0 3 6 21 3.96 4.60
## 8 Foreign 2021 54 0 0 0 6 29 3.09 5.04
## 9 Hawaiian / Pac. … 2020 1 0 0 0 0 0 0 NA
## 10 Hawaiian / Pac. … 2021 1 3 3 3 3 3 3 NA
## 11 Hispanic 2020 327 0 0 3 6 21 3.48 3.84
## 12 Hispanic 2021 263 0 0 3 6 42 4.65 5.14
## 13 Multi-Race 2020 33 0 1 3 9 11 4.27 3.83
## 14 Multi-Race 2021 35 0 0 3 6 26 4.11 4.95
## 15 Unknown 2020 5 0 1 1 4 9 3 3.67
## 16 Unknown 2021 2 3 3 3.5 4 4 3.5 0.707
## 17 White 2020 112 0 0 4 7 27 4.74 4.87
## 18 White 2021 147 0 3 4 7 33 5.16 5.14
Boxplots of GPA by year by MCPS students 20yrs and younger
p11 = ggplot(df_MCPS20D, aes(mc_gpa))
p11 + geom_boxplot(aes(colour = term_year)) +
facet_wrap(~full_part)
Boxplots of GPA by year by Full time MCPS students 20yrs and younger
df_MCPS20D%>%filter(full_part=="FT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(mc_gpa))+
geom_boxplot(aes(colour = term_year)) +
facet_wrap(~race)
Boxplots of GPA by year by Part time MCPS students 20yrs and younger
df_MCPS20D%>%filter(full_part=="PT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(mc_gpa))+
geom_boxplot(aes(colour = term_year)) +
facet_wrap(~race)
Density plot of GPA by year
ggplot(df_MCPS20D, aes(mc_gpa, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~full_part)+
xlab("GPA") +
ylab( "Density")+
ggtitle(" GPA by Full-time vs Part-time Students")
GPA by full time students
df_MCPS20D%>%filter(full_part=="FT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(mc_gpa, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~race)+
xlab("GPA") +
ylab( "Density")+
ggtitle(" GPA of Full-time Students")
Fivenum Summary of Full time students
df_MCPS20D%>% filter(full_part=="FT")%>%
group_by(race,term_year)%>%
summarise(n = n(),
min = fivenum(mc_gpa)[1],
Q1 = fivenum(mc_gpa)[2],
median = fivenum(mc_gpa)[3],
Q3 = fivenum(mc_gpa)[4],
max = fivenum(mc_gpa)[5],
mean= mean(mc_gpa),
sd = sd(mc_gpa))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups: race [9]
## race term_year n min Q1 median Q3 max mean sd
## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Am. Indian / AK … 2020 5 0 2.35 2.9 3.5 4 2.55 1.55
## 2 Am. Indian / AK … 2021 1 2.77 2.77 2.77 2.77 2.77 2.77 NA
## 3 Asian 2020 272 0 2.33 3.3 3.73 4 2.93 1.03
## 4 Asian 2021 227 0 2.5 3.23 3.71 4 2.88 1.12
## 5 Black / African … 2020 389 0 1.5 2.5 3.14 4 2.25 1.18
## 6 Black / African … 2021 326 0 1.33 2.67 3.4 4 2.31 1.30
## 7 Foreign 2020 103 0 2 3 3.65 4 2.71 1.20
## 8 Foreign 2021 96 0 1.46 2.82 3.69 4 2.48 1.35
## 9 Hawaiian / Pac. … 2020 5 0 0 2.25 2.67 3.77 1.74 1.68
## 10 Hawaiian / Pac. … 2021 3 1.75 2.22 2.68 3.34 4 2.81 1.13
## 11 Hispanic 2020 534 0 1.5 2.70 3.44 4 2.38 1.25
## 12 Hispanic 2021 596 0 1.23 2.66 3.33 4 2.29 1.30
## 13 Multi-Race 2020 71 0 2 2.75 3.5 4 2.59 1.13
## 14 Multi-Race 2021 63 0 1.5 2.6 3.54 4 2.37 1.35
## 15 Unknown 2020 11 0.33 2.12 2.33 3.32 4 2.55 1.00
## 16 Unknown 2021 3 2.55 2.65 2.75 3.38 4 3.1 0.786
## 17 White 2020 265 0 1.8 3 3.6 4 2.59 1.22
## 18 White 2021 241 0 2 3 3.69 4 2.67 1.26
GPA of Part time students
df_MCPS20D%>%filter(full_part=="PT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(mc_gpa, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~race)+
xlab("Hours Earned") +
ylab( "Density")+
ggtitle(" GPA of Part-time Students")
Fivenum Summary of Part time students
df_MCPS20D%>% filter(full_part=="PT")%>%
group_by(race,term_year)%>%
summarise(n = n(),
min = fivenum(mc_gpa)[1],
Q1 = fivenum(mc_gpa)[2],
median = fivenum(mc_gpa)[3],
Q3 = fivenum(mc_gpa)[4],
max = fivenum(mc_gpa)[5],
mean= mean(mc_gpa),
sd = sd(mc_gpa))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups: race [9]
## race term_year n min Q1 median Q3 max mean sd
## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Am. Indian / AK … 2020 4 0 0.5 1.25 2.25 3 1.38 1.25
## 2 Am. Indian / AK … 2021 1 2 2 2 2 2 2 NA
## 3 Asian 2020 69 0 0 2.3 3.33 4 2.01 1.54
## 4 Asian 2021 63 0 0.8 2 3.28 4 1.94 1.48
## 5 Black / African … 2020 177 0 0 1.33 2.71 4 1.46 1.38
## 6 Black / African … 2021 181 0 0 0.33 2.33 4 1.13 1.32
## 7 Foreign 2020 73 0 0 2 3 4 1.65 1.51
## 8 Foreign 2021 54 0 0 0 2.67 4 1.20 1.46
## 9 Hawaiian / Pac. … 2020 1 0 0 0 0 0 0 NA
## 10 Hawaiian / Pac. … 2021 1 4 4 4 4 4 4 NA
## 11 Hispanic 2020 327 0 0 1.5 3 4 1.60 1.50
## 12 Hispanic 2021 263 0 0 2 3 4 1.73 1.43
## 13 Multi-Race 2020 33 0 0.67 2 3.5 4 1.99 1.51
## 14 Multi-Race 2021 35 0 0 2.5 3 4 1.80 1.55
## 15 Unknown 2020 5 0 0.75 2 3.67 4 2.08 1.75
## 16 Unknown 2021 2 3 3 3.5 4 4 3.5 0.707
## 17 White 2020 112 0 0 2 3.33 4 1.86 1.54
## 18 White 2021 147 0 0.55 2.5 3.33 4 2.16 1.49
## Hours Earned Rate
Density plot of Hours Earned Rate by year
ggplot(df_MCPS20D, aes(hours_earned_rate, fill = term_year)) + geom_density(alpha = 0.3) +
facet_wrap(~full_part)+
xlab("Hours Earned Rate") +
ylab( "Density")+
xlim(0,1)
Boxplots of Hours Earned Rate of Full time MCPS students 20yrs and younger
df_MCPS20D%>%filter(full_part=="FT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_earned_rate))+
geom_boxplot(aes(colour = term_year)) +
facet_wrap(~race)
Boxplots of Hours Earned Rate of Part time MCPS students 20yrs and younger
df_MCPS20D%>%filter(full_part=="PT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_earned_rate))+
geom_boxplot(aes(colour = term_year)) +
facet_wrap(~race)
Hours Earned Rate of full time students
df_MCPS20D%>%filter(full_part=="FT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_earned_rate, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~race)+
xlab("GPA") +
ylab( "Density")+
ggtitle(" Hours Earned Rate of Full-time Students")
Hours Earned Rate of part time students
df_MCPS20D%>%filter(full_part=="PT")%>%
filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
ggplot(., aes(hours_earned_rate, fill = term_year)) + geom_density(alpha = 0.2) +
facet_wrap(~race)+
xlab("GPA") +
ylab( "Density")+
ggtitle(" Hours Earned Rate of Part-time Students")